Introduction

Since the beginning of the Landsat missions, the remote sensing community has been interested in developing universal algorithms for extracting water quality information from remotely sensed images [@Lots of old papers]. While there has been significant success in the oceanic community towards universal algorithms for chlorophyll, sediment, and doc [cites], there is no inland water equivalent. Much of this discrepancy comes from the increased optical complexity of inland waters, which prevents the use of a more universal algorithm, but progress on inland waters is further impeded by the lack of a shared dataset of overpasses and in situ concentration information. Here we create and share the largest such overpass dataset ever assembled. We also outline and share our approach to bringing three publicly available, free datasets to generate a high-graded analysis-ready dataset for remote sensors of water quality. While a specific universal algorithm may be an unattainable goal, we anticipate that this dataset will move us towards more universal approaches based on shared and equal access to overpass information.

Potential for transformative research with remote sensing of water quality

Despite the long-recognized potential, until recently, the general hydrology and limnology communities have not integrated data from remote sensing of inland waters into our research approach [Topp]. Instead, these communities have focused much of our research on Eulerian sampling schemes with sensors or people repeatedly sampling the same points in a river or lake [DoyleEnsign]. This research approach has generated a wealth of information on temporal variability in inland waters, but there has been less work looking at spatial variability in rivers, lakes, and estuaries. Remote estimates of water quality in these ecosystems would allow for rapid assessment of potential algae blooms, detection of high-sediment waters, and analysis of spatio-temporal variability [cites].

Historic barriers

Serious citation of Topp, maybe none of this at all?

Modern solutions

With the profusion of publicly available in situ water quality datasets and the relatively easily-accessible satellite mission archive

Methods

LANDSAT

Satellite Years Available images
5 1984-2012 192,688
7 1999-2018 188,781
8 2013-2018 58,585

WQP Parameters

Rivers, Lakes, and Estuaries/Deltas

Water Quality Portal

data pull and parameters therein

LAGOSNE

Describe Lagos daasets

In Situ data unification

Joining landsat and water quality portal

Google Earth Engine

How we selected sites (pekel occurence)

Diagram of joining procedures and counts of observations dropped

Data quality flagging

Not sure what to put here or if we should have this section

Results

For LAGOSNE data see here

Dataset description

Dataset generation

Dataset generation

Map

Distribution of observations across the conterminous USA. The data is split by observation type, where total represents an overpass for any of the four primary parameters

Distribution of observations across the conterminous USA. The data is split by observation type, where total represents an overpass for any of the four primary parameters

Distribution of observations per site

Observations over time

Variation captured by the datasets

Distributions of in situ vs matchup datasets

Observations lost

For both DOC and TSS our matchup dataset is missing the long tail of data in the in situ dataset. What kind of sites were dropped to create this discrepancy? They are basically all streams.

## 
## Estuary    Lake  Stream 
##       6      54   12665

Spectral variation

Partitioning variation by sediment concentration and region

TSS covarying with other constituents

TSS impacts reflectance relationships for all constituent band combinations

Variation by ecoregion

## Reading layer `us_eco_l3' from data source `/home/matt/Dropbox/UNC-PostDocAll/aquasat/9_report/in/us_eco_l3/us_eco_l3.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1250 features and 13 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -2356069 ymin: 272048.5 xmax: 2258225 ymax: 3172577
## epsg (SRID):    NA
## proj4string:    +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs

Variation in TSS by ecoregion

Variation in spectral response by ecoregion

Supplementary

Ecoregion models

Mississippi Basin Example

Spectral medians captured at sample sites

Spectral variation captured by our circles

Lots of PCA analyses

Clusters mapped onto tss doc plot.